Variable selection in model-based clustering: A general variable role modeling
نویسندگان
چکیده
The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. We propose a more versatile variable selection model which describes three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent on a part of the relevant clustering variables and the irrelevant clustering variables totally independent of all the relevant variables. A model selection criterion and a variable selection algorithm are derived for this new variable role modeling. The model identifiability and the consistency of the variable selection criterion are also established. Numerical experiments highlight the interest of this new modeling. Key-words: Relevant, redundant or independent variables, Variable selection, Model-based clustering, Linear regression, BIC ∗ Université Paris-Sud 11,Projet select † INRIA Saclay Île-de-France, Projet select, Université Paris-Sud 11 ‡ UMR AgroParisTech/INRA MIA 518, Paris § URGV UMR INRA 1165, CNRS 8114, UEVE, Evry Sélection de variables pour la classification non supervisée par mélanges gaussiens : une modélisation générale du rôle des variables Résumé : Les procédures de sélection de variables actuellement disponibles en classification non supervisée par mélanges gaussiens supposent que les variables non significatives pour la classification sont toutes indépendantes ou sont toutes liées aux variables significatives. Nous proposons un modèle de sélection de variables plus général qui permet pour chaque variable d’être une variable significative pour la classification, d’être non significative mais dépendante d’une partie ou de toutes les variables significatives ou d’être non significative et indépendante des variables significatives. Le critère de sélection de modèles et l’algorithme de sélection de variables sont établis pour cette nouvelle modélisation. L’identifiabilité des modèles et la consistance du critère de sélection sont également établis. Des exemples numériques mettent en évidence l’intérêt de cette nouvelle modélisation. Mots-clés : Variables significatives, redondantes ou indépendantes, Sélection de variables, Classification non supervisée, Mélanges gaussiens, Régression linéaire, BIC Variable selection in model-based clustering: A general variable role modeling 3
منابع مشابه
Multivariate Estimation of Rock Mass Characteristics Respect to Depth Using ANFIS Based Subtractive Clustering- Khorramabad- Polezal Freeway Tunnels
Combination of Adoptive Network based Fuzzy Inference System (ANFIS) and subtractive clustering (SC) has been used for estimation of deformation modulus (Em) and rock mass strength (UCSm) considering depth of measurement. To do this, learning of the ANFIS based subtractive clustering (ANFISBSC) was performed firstly on 125 measurements of 9 variables such as rock mass strength (UCSm), deformati...
متن کاملItem Response Theory Modeling for Microarray Gene Expression Data
The high dimensionality of global gene expression profiles, where number of variables (genes) is very large compared to the number of observations (samples), presents challenges that affect generalizability and applicability of microarray analysis. Latent variable modeling offers a promising approach to deal with high-dimensional microarray data. The latent variable model is based on a few late...
متن کاملInvestigating the Effects of Economic Shocks with housing Finance in a DSGE Model
With the introduction of any given variable in random dynamic general equilibrium models, the behavioral functions of economic agents will change, and consequently the shocks to the economy show a different direction and response. One of the most important sectors in the dynamics of a countrychr('39')s economy is the housing sector and one of the main determinants of the stagnation and boom of ...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملSelection of Variables that Influence Drug Injection in Prison: Comparison of Methods with Multiple Imputed Data Sets
Background: Prisoners, compared to the general population, are at greater risk of infection. Drug injection is the main route of HIV transmission, in particular in Iran. What would be of interest is to determine variables that govern drug injection among prisoners. However, one of the issues that challenge model building is incomplete national data sets. In this paper, we addressed the process ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Statistics & Data Analysis
دوره 53 شماره
صفحات -
تاریخ انتشار 2009